{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Keras LSTM for IMDB Sentiment Classification\n", "\n", "This is simple example of how to explain a Keras LSTM model using DeepExplainer." ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "scrolled": false }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Using TensorFlow backend.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Loading data...\n", "25000 train sequences\n", "25000 test sequences\n", "Pad sequences (samples x time)\n", "x_train shape: (25000, 80)\n", "x_test shape: (25000, 80)\n", "Build model...\n", "Train...\n", "Train on 25000 samples, validate on 25000 samples\n", "Epoch 1/15\n", "25000/25000 [==============================] - 113s 5ms/step - loss: 0.4577 - acc: 0.7825 - val_loss: 0.3970 - val_acc: 0.8246\n", "Epoch 2/15\n", "25000/25000 [==============================] - 110s 4ms/step - loss: 0.3048 - acc: 0.8752 - val_loss: 0.3794 - val_acc: 0.8330\n", "Epoch 3/15\n", "25000/25000 [==============================] - 109s 4ms/step - loss: 0.2210 - acc: 0.9129 - val_loss: 0.4197 - val_acc: 0.8300\n", "Epoch 4/15\n", "25000/25000 [==============================] - 113s 5ms/step - loss: 0.1557 - acc: 0.9433 - val_loss: 0.4687 - val_acc: 0.8279\n", "Epoch 5/15\n", "25000/25000 [==============================] - 114s 5ms/step - loss: 0.1057 - acc: 0.9615 - val_loss: 0.6095 - val_acc: 0.8240\n", "Epoch 6/15\n", "25000/25000 [==============================] - 136s 5ms/step - loss: 0.0790 - acc: 0.9720 - val_loss: 0.7360 - val_acc: 0.8177\n", "Epoch 7/15\n", "25000/25000 [==============================] - 127s 5ms/step - loss: 0.0755 - acc: 0.9746 - val_loss: 0.6201 - val_acc: 0.8180\n", "Epoch 8/15\n", "25000/25000 [==============================] - 121s 5ms/step - loss: 0.0436 - acc: 0.9854 - val_loss: 0.8128 - val_acc: 0.8169\n", "Epoch 9/15\n", "25000/25000 [==============================] - 110s 4ms/step - loss: 0.0312 - acc: 0.9895 - val_loss: 0.9553 - val_acc: 0.8145\n", "Epoch 10/15\n", "25000/25000 [==============================] - 114s 5ms/step - loss: 0.0283 - acc: 0.9909 - val_loss: 0.9576 - val_acc: 0.8126\n", "Epoch 11/15\n", "25000/25000 [==============================] - 108s 4ms/step - loss: 0.0172 - acc: 0.9949 - val_loss: 0.9107 - val_acc: 0.8117\n", "Epoch 12/15\n", "25000/25000 [==============================] - 108s 4ms/step - loss: 0.0156 - acc: 0.9954 - val_loss: 0.9634 - val_acc: 0.8096\n", "Epoch 13/15\n", "25000/25000 [==============================] - 110s 4ms/step - loss: 0.0119 - acc: 0.9962 - val_loss: 1.0733 - val_acc: 0.8123\n", "Epoch 14/15\n", "25000/25000 [==============================] - 113s 5ms/step - loss: 0.0117 - acc: 0.9964 - val_loss: 1.1165 - val_acc: 0.8106\n", "Epoch 15/15\n", "25000/25000 [==============================] - 111s 4ms/step - loss: 0.0107 - acc: 0.9970 - val_loss: 1.0867 - val_acc: 0.8091\n", "25000/25000 [==============================] - 17s 688us/step\n", "Test score: 1.0867270610725879\n", "Test accuracy: 0.80912\n" ] } ], "source": [ "# This model training code is directly from:\n", "# https://github.com/keras-team/keras/blob/master/examples/imdb_lstm.py\n", "\n", "\"\"\"Trains an LSTM model on the IMDB sentiment classification task.\n", "The dataset is actually too small for LSTM to be of any advantage\n", "compared to simpler, much faster methods such as TF-IDF + LogReg.\n", "# Notes\n", "- RNNs are tricky. Choice of batch size is important,\n", "choice of loss and optimizer is critical, etc.\n", "Some configurations won't converge.\n", "- LSTM loss decrease patterns during training can be quite different\n", "from what you see with CNNs/MLPs/etc.\n", "\"\"\"\n", "\n", "from keras.datasets import imdb\n", "from keras.layers import LSTM, Dense, Embedding\n", "from keras.models import Sequential\n", "from keras.preprocessing import sequence\n", "\n", "max_features = 20000\n", "maxlen = 80 # cut texts after this number of words (among top max_features most common words)\n", "batch_size = 32\n", "\n", "print(\"Loading data...\")\n", "(x_train, y_train), (x_test, y_test) = imdb.load_data(num_words=max_features)\n", "print(len(x_train), \"train sequences\")\n", "print(len(x_test), \"test sequences\")\n", "\n", "print(\"Pad sequences (samples x time)\")\n", "x_train = sequence.pad_sequences(x_train, maxlen=maxlen)\n", "x_test = sequence.pad_sequences(x_test, maxlen=maxlen)\n", "print(\"x_train shape:\", x_train.shape)\n", "print(\"x_test shape:\", x_test.shape)\n", "\n", "print(\"Build model...\")\n", "model = Sequential()\n", "model.add(Embedding(max_features, 128))\n", "model.add(LSTM(128, dropout=0.2, recurrent_dropout=0.2))\n", "model.add(Dense(1, activation=\"sigmoid\"))\n", "\n", "# try using different optimizers and different optimizer configs\n", "model.compile(loss=\"binary_crossentropy\", optimizer=\"adam\", metrics=[\"accuracy\"])\n", "\n", "print(\"Train...\")\n", "model.fit(x_train, y_train, batch_size=batch_size, epochs=15, validation_data=(x_test, y_test))\n", "score, acc = model.evaluate(x_test, y_test, batch_size=batch_size)\n", "print(\"Test score:\", score)\n", "print(\"Test accuracy:\", acc)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Explain the model with DeepExplainer and visualize the first prediction" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "import shap\n", "\n", "# we use the first 100 training examples as our background dataset to integrate over\n", "explainer = shap.DeepExplainer(model, x_train[:100])\n", "\n", "# explain the first 10 predictions\n", "# explaining each prediction requires 2 * background dataset size runs\n", "shap_values = explainer.shap_values(x_test[:10])" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "
\n", "
\n", " Visualization omitted, Javascript library not loaded!
\n", " Have you run `initjs()` in this notebook? If this notebook was from another\n", " user you must also trust this notebook (File -> Trust notebook). If you are viewing\n", " this notebook on github the Javascript has been stripped for security. If you are using\n", " JupyterLab this error is because a JupyterLab extension has not yet been written.\n", "
\n", " " ], "text/plain": [ "" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# init the JS visualization code\n", "shap.initjs()\n", "\n", "# transform the indexes to words\n", "import numpy as np\n", "\n", "words = imdb.get_word_index()\n", "num2word = {}\n", "for w in words.keys():\n", " num2word[words[w]] = w\n", "x_test_words = np.stack([np.array([num2word.get(x, \"NONE\") for x in x_test[i]]) for i in range(10)])\n", "\n", "# plot the explanation of the first prediction\n", "# Note the model is \"multi-output\" because it is rank-2 but only has one column\n", "shap.force_plot(explainer.expected_value[0], shap_values[0][0], x_test_words[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that each sample is an IMDB review text document, represented as a sequence of words. This means \"feature 0\" is the first word in the review, which will be different for difference reviews. This means calling summary_plot will combine the importance of all the words by their position in the text. This is likely not what you want for a global measure of feature importance (which is why we have not called summary_plot here). If you do want a global summary of a word's importance you could pull apart the feature attribution values and group them by words." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.5" } }, "nbformat": 4, "nbformat_minor": 2 }